Tolerating First Level Memory Access Latency in High-Performance Systems
نویسندگان
چکیده
In order to improve performance, future parallel systems will continue to increase the processing power of each node in a system. As node processors, though, can execute more instructions concurrently, they become more sensitive to the rst level memory access latency. This paper presents a set of hardware and software techniques, collectively referred to as register preloading, to effectively tolerate long rst level memory access latency. The techniques include speculative execution, loop unrolling, dynamic memory disambiguation, and strip-mining. Results show that register preloading provides excellent tolerance to rst level memory access latency up to 16 cycles for an issue 4 node processor.
منابع مشابه
TOLERATING FIRST LEVEL MEMORY ACCESS LATENCYIN HIGH - PERFORMANCE SYSTEMSWilliam
In order to improve performance, future parallel systems will continue to increase the processing power of each node in a system. As node processors, though, can execute more instructions concurrently, they become more sensitive to the rst level memory access latency. This paper presents a set of hardware and software techniques , collectively referred to as register preloading, to effectively ...
متن کاملA performance evaluation of cache injection in bus-based shared memory multiprocessors
Bus-based shared memory multiprocessors with private caches and snooping write-invalidate cache coherence protocols are dominant form of smallto medium-scale parallel machines today. In these systems the high memory latency poses the major hurdle in achieving high performance. One way to cope with this problem is to use various techniques for tolerating high memory latency. Software-controlled ...
متن کاملRelative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques
In both hardware-only and software-only directory protocols the performance is often limited by memory access stall times. To increase the performance, several latency tolerating and reducing techniques have been proposed and shown effective for hardware-only directory protocols. For software-only directory protocols, the efficiency of a technique depends not only on how effective it is as seen...
متن کاملEffects of Multithreading on Cache Performance
ÐAs the performance gap between processor and memory grows, memory latency becomes a major bottleneck in achieving high processor utilization. Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency by exploiting thread-level parallelism. The question, however, remains as to how effective multithreading is on tolerating memory latency. The...
متن کاملLow Power System Design by Combining Software Prefetching and Dynamic voltage Scaling
Performance-enhancement techniques improve CPU speed at the cost of other valuable system resources such as power and energy. Software prefetching is one such tech21 nique, tolerating memory latency for high performance. In this article, we quantitatively study this technique’s impact on system performance and power/energy consumption. 23 First, we demonstrate that software prefetching achieves...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1992